Skip to main content
TrustRadius
Hadoop

Hadoop

Overview

What is Hadoop?

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Read more
Recent Reviews

TrustRadius Insights

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, …
Continue reading

Hadoop vs. Alternatives

8 out of 10
June 05, 2019
Incentivized
It is being used at our Fortune 500 clients. It is great for storage, but it is not well understood by the business. The challenge is that …
Continue reading

Hadoop Review

7 out of 10
May 16, 2018
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of …
Continue reading

Hadoop is pretty Badass

9 out of 10
January 04, 2018
Incentivized
Apache Hadoop is a cost effective solution for storing and managing vast amounts of data efficiently. It is dependable and works even when …
Continue reading
Read all reviews
Return to navigation

Product Demos

Installation of Apache Hadoop 2.x or Cloudera CDH5 on Ubuntu | Hadoop Practical Demo

YouTube

Big Data Complete Course and Hadoop Demo Step by Step | Big Data Tutorial for Beginners | Scaler

YouTube

Hadoop Tutorial For Beginners | Apache Hadoop Tutorial For Beginners | Hadoop Tutorial | Simplilearn

YouTube
Return to navigation

Product Details

What is Hadoop?

Hadoop Video

What is Hadoop?

Hadoop Technical Details

Operating SystemsUnspecified
Mobile ApplicationNo

Frequently Asked Questions

Hadoop is an open source software from Apache, supporting distributed processing and data storage. Hadoop is popular for its scalability, reliability, and functionality available across commoditized hardware.

Reviewers rate Data Sources highest, with a score of 8.7.

The most common users of Hadoop are from Mid-sized Companies (51-1,000 employees).
Return to navigation

Comparisons

View all alternatives
Return to navigation

Reviews and Ratings

(270)

Community Insights

TrustRadius Insights are summaries of user sentiment data from TrustRadius reviews and, when necessary, 3rd-party data sources. Have feedback on this content? Let us know!

Hadoop has been widely adopted by organizations for various use cases. One of its key use cases is in storing and analyzing log data, financial data from systems like JD Edwards, and retail catalog and session data for an omnichannel experience. Users have found that Hadoop's distributed processing capabilities allow for efficient and cost-effective storage and analysis of large amounts of data. It has been particularly helpful in reducing storage costs and improving performance when dealing with massive data sets. Furthermore, Hadoop enables the creation of a consistent data store that can be integrated across platforms, making it easier for different departments within organizations to collect, store, and analyze data. Users have also leveraged Hadoop to gain insights into business data, analyze patterns, and solve big data modeling problems. The user-friendly nature of Hadoop has made it accessible to users who are not necessarily experts in big data technologies. Additionally, Hadoop is utilized for ETL processing, data streaming, transformation, and querying data using Hive. Its ability to serve as a large volume ETL platform and crunching engine for analytical and statistical models has attracted users who were previously reliant on MySQL data warehouses. They have observed faster query performance with Hadoop compared to traditional solutions. Another significant use case for Hadoop is secure storage without high costs. Hadoop efficiently stores and processes large amounts of data, addressing the problem of secure storage without breaking the bank. Moreover, Hadoop enables parallel processing on large datasets, making it a popular choice for data storage, backup, and machine learning analytics. Organizations have found that it helps maintain and process huge amounts of data efficiently while providing high availability, scalability, and cost efficiency. Hadoop's versatility extends beyond commercial applications—it is also used in research computing clusters to complete tasks faster using the MapReduce framework. Finally, the Systems and IT department relies on Hadoop to create data pipelines and consult on potential projects involving Hadoop. Overall, the use cases of Hadoop span across industries and departments, providing valuable solutions for data collection, storage, and analysis.

Attribute Ratings

Reviews

(1-11 of 11)
Companies can't remove reviews or game the system. Here's why
Score 7 out of 10
Vetted Review
Verified User
Incentivized
[Apache Hadoop] is being handled as it is (mostly) intended. For large, unstructured data management from our data flows to include logging and reports extract, transform and load. We are using it at a medium scale in an on-prem server delivery with Cloudera as the management platform. While I firmly believe cloudera makes it a bit easier to manage, it obfuscates issues at times.
  • Handles large amounts of unstructured data well, for business level purposes
  • Is a good catchall because of this design, i.e. what does not fit into our vertical tables fits here.
  • Decent for large ETL pipelines and logging free-for-alls because of this, also.
  • Many, many modules and because of Apache open source, takes time to learn
  • Integration is not always seamless between the disparate pieces nor are all the pieces required.
  • Optimization can be challenging (see PSTL design)
Apache Hadoop (and its subsequent add-ons) are well-suited to larger, unstructured data flows, such as aggregation of web traffic or advertising. Geospatial algorithms and their outputs are well-suited for this kind of aggregation as structuring that data is challenging, but leaving it unstructured and performing queries as-needed is a better fit for most business models. With the advent of data science, I would expect Hadoop fits a LOT of their initial outputs quite well.
Gene Baker | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
We are using it within my department to process large sets of data that can't be processed in a timely fashion on a single computer or node. The various modules provided with Hadoop make it easy for us to implement map-reduce and perform parallel processing on large sets of data. We have approximately 40TB of data that we run various algorithms against as we try to use the data to solve business problems and prevent fraudulent transactions.
  • Map-reduce
  • Parallel processing
  • Handles node failures
  • HDFS: distributed file system
  • More connectors
  • Query optimization
  • Job scheduling
Hadoop is easy to use. It is a scalable and cost-effective solution for working with large data sets. Hadoop accepts data from a variety of disparate data sources, such as social media feeds, structured or unstructured data, XML, text files, images, etc. Hadoop is also highly available and fault-tolerant, supporting multiple standby NameNodes. The performance of Hadoop is also good because it stores data in a distributed fashion allowing for distributed processing and lower run times. And Hadoop is open-source, making the source code available for modification if necessary. Hadoop also supports multiple languages like C/C++, Python, and Groovy.
May 16, 2018

Hadoop Review

Kartik Chavan | TrustRadius Reviewer
Score 7 out of 10
Vetted Review
Verified User
Incentivized
It is massively being used in our organization for data storage, data backup, and machine learning analytics. Managing vast amounts of data has become quite easy since the arrival of the Hadoop environment. Our department is on verge of moving towards Spark instead of MapReduce, but for now, Hadoop is being used extensively for MapReduce purposes.
  • Hadoop Distributed Systems is reliable.
  • High scalability
  • Open Sources, Low Cost, Large Communities
  • Compatibility with Windows Systems
  • Security needs more focus
  • Hadoop lack in real time processing
Hadoop helps us tackle our problem of maintaining and processing a huge amount of data efficiently. High availability, scalability and cost efficiency are the main considerations for implementing Hadoop as one of the core solutions in our big-data infrastructure. Where relational databases fall short with regard to tuning and performance, Hadoop rises to the occasion and allows for massive customization leveraging the different tools and modules. We use Hadoop to input raw data and add layers of consolidation or analysis to make business decisions about disparate data points.

Bharadwaj (Brad) Chivukula | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
  • Used for Massive data collection, storage, and analytics
  • Used for MapReduce processes, Hive tables, Spark job input, and for backing up data
  • Storing Retail Catalog & Session data to enable omnichannel experience for customers, and a 360-degree customer insight
  • Having a consistent data store that can be integrated across other platforms, and have one single source of truth.
  • HDFS is reliable and solid, and in my experience with it, there are very few problems using it
  • Enterprise support from different vendors makes it easier to 'sell' inside an enterprise
  • It provides High Scalability and Redundancy
  • Horizontal scaling and distributed architecture
  • Less organizational support system. Bugs need to be fixed and outside help take a long time to push updates
  • Not for small data sets
  • Data security needs to be ramped up
  • Failure in NameNode has no replication which takes a lot of time to recover
  • Less appropriate for small data sets
  • Works well for scenarios with bulk amount of data. They can surely go for Hadoop file system, having offline applications
  • It's not an instant querying software like SQL; so if your application can wait on the crunching of data, then use it
  • Not for real-time applications
August 24, 2017

Hadoop for Big Data

Vinay Suneja | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
[It was used] As a proof of concept to analyze a huge amount of data. We were building a product to analyze huge data and eventually sell that product to a utility.
  • Highly Scalable Architecture
  • Low cost
  • Can be used in a Cloud Environment
  • Can be run on commodity Hardware
  • Open Source
  • Its open source but there are companies like hortonworks, Cloudera etc., which give enterprise support
  • Lots of scripting still needed
  • Some tools in the hadoop eco system overlap
  • To analyze a huge quantity of data at a low cost. It is definitely the future.
  • Machine learning with Spark is also a good use case.
  • You can also use AWS - EMR with S3 to store a lot of data with low cost.
Mark Gargiulo | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
We needed a robust/redundant system to run multiple simultaneous jobs for our ETL pipeline, this needed distributed storage space, integration with Windows AD user accounts and the ability to expand when needed with little to no downtime.
We are using Cloudera 5.6 to orchestrate the install (along with puppet) and manage the hadoop cluster.
  • The distributed replicated HDFS filesystem allows for fault tolerance and the ability to use low cost JBOD arrays for data storage.
  • Yarn with MapReduce2 gives us a job slot scheduler to fully utilize available compute resources while providing HA and resource management.
  • The hadoop ecosystem allows for the use of many different technologies all using the same compute resources so that your spark, samza, camus, pig and oozie jobs can happily co-exist on the same infrastructure.
  • Without Cloudera as a management interface the hadoop components are much harder to manage to ensure consistency across a cluster.
  • The calculations of hardware resources to job slots/resource management can be quite an exercise in finding that "sweet spot" with your applications, a more transparent way of figuring this out would be welcome.
  • A lot of the roles and management pieces are written in java, which from an administration perspective can have there own issues with garbage collection and memory management.
Hadoop is not for the faint of heart and is not a technology per se but an ecosystem of disparate technologies sitting on top of HDFS. It is certainly powerful but if, like me, you were handed this with no prior knowledge or experience using or administering this ecosystem the learning curve can be significant and ongoing having said that I don't think currently there are many other opensource technologies that can provide the flexibility in the "big data" arena especially for ETL or machine learning.
February 23, 2016

Hadoop quick review

Score 9 out of 10
Vetted Review
Verified User
Incentivized
We have Hadoop pre-prod and prod clusters. Production clusters are comprised of 200 nodes. And we have realtime clusters as well. All the data will be moved to Hadoop. We use Hadoop to do machine learning and data warehousing.
  • Machine Learning Model, when SAS can not process 3 of years data. Hadoop is good tool to build the model.
  • Data warehousing is also another good use case. Using Teradata is expensive.
  • A lot of people are not from a programming background which makes Hue very important for end users when starting the Hadoop journey. Making Hue more user friendly and functional will be helpful for end users who don't much of a programming background.
Data is growing and grows fast. A relationship database can't hold this requirement any more. Real-time applications and distributed design are required for highly scalability and fault tolerance.
Tushar Kulkarni | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I have been working with Hadoop since last year. It is very user friendly. Hadoop was used by the data center management team. It allows distributed processing of huge amount of data sets across clusters of computers using simple programming models.
  • It is robust in the sense that any big data applications will continue to run even when individual servers fail.
  • Enormous data can be easily sorted.
  • It can be improved in terms of security.
  • Since it is open source, stability issues must be improved.
Hadoop is really very useful when dealing with big data.
Mrugen Deshmukh | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User
Incentivized
I have used Hadoop for building business feeds for a telecom client. The major purpose for using Hadoop was to tackle the problem of gaining insights into the ever growing number of business data. We leveraged the map reduce programming model to churn more than 30 gigabytes of data per day into actionable and aggregated data which was further leveraged by campaign teams to design and shape marketing and by product teams to envision new customer experiences.
  • Hadoop is an excellent framework for building distributed, fault tolerant data processing systems which leverage HDFS which is optimized for low latency storage and high throughput performance.
  • Hadoop Map reduce is a powerful programming model and can be leveraged directly either via use of Java programming language or by data flow languages like Apache Pig.
  • Hadoop has a reach eco system of companion tools which enable easy integration for ingesting large amounts of data efficiently from various sources. For example Apache Flume can act as data bus which can use HDFS as a sink and integrates effectively with disparate data sources.
  • Hadoop can also be leveraged to build complex data processing and machine learning workflows, due to availability of Apache Mahout, which uses the map reduce model of Hadoop to run complex algorithms.
  • Hadoop is a batch oriented processing framework, it lacks real time or stream processing.
  • Hadoop's HDFS file system is not a POSIX compliant file system and does not work well with small files, especially smaller than the default block size.
  • Hadoop cannot be used for running interactive jobs or analytics.
1. How large are your data sets? If your answer is few gigabytes, Hadoop may be overkill for your needs.
2. Do you require real-time analytical processing? If yes, Hadoop's map reduce may not be a great asset in that scenario.
3. Do you want to want to process data in a batch processing fashion and scale for TeraBytes size clusters? Hadoop is definitely a great fit for your use case.
Gaurav Kasliwal | TrustRadius Reviewer
Score 10 out of 10
Vetted Review
Verified User
Incentivized
I have been using Hadoop for 2 years and I really find it very useful, especially working with bigger datasets. I have used Hadoop and Mahout for my project to analyze and learn different patterns from Yelp Dataset. It was really very easy and user friendly to use.

  • Scalability. Hadoop is really useful when you are dealing with a bigger system and you want to make your system scalable.
  • Reliable. Very reliable.
  • Fast, Fast Fast!!! Hadoop really works very fast, even with bigger datasets.
  • Development tools are not that easy to use.
  • Learning curve can be reduced. As of now, some skill is a must to use Hadoop.
  • Security. In today's world, security is of prime importance. Hadoop could be made more secure to use.
Hadoop is really useful for larger datasets. It is not very useful when you are dealing with a smaller dataset.
Score 10 out of 10
Vetted Review
Verified User
Hadoop is part of the overall Data Strategy and is mainly used as a large volume ETL platform and crunching engine for proprietary analytical and statistical models. The biggest challenge for developers/users is moving from an RDBMS query approach for accessing data to a schema on read and list processing framework. The learning curve is steep upfront, but Hive and end user tools like Datameer can help to bridge the gap. Data governance and stewardship are of key importance given the fluid nature of how data is stored and accessed.
  • Gives developers and data analysts flexibility for sourcing, storing and handling large volumes of data.
  • Data redundancy and tunable MapReduce parameters to ensure jobs complete in the event of hardware failure.
  • Adding capacity is seamless.
  • Logs that are easier to read.
Not an RDBMS - not well suited for traditional BI applications.
Return to navigation